TODO: Refine title

Initial Questions

TODO: Must have at least two questions. It is best to have different types of problems, ie one regression, and one classification

Objective

TODO: Analysis: Identify the questions, what is the objective/goal of processing this dataset? What answers are you interested to find through this dataset.
TODO: Determine the details about the dataset (eg. title, year, the purpose of dataset, dimension content, structure, summary) by exploring the raw data.
TODO: Short introduction with objective of the project.

Data Cleaning and Preprocessing

TODO: Which section of the data do you need to tidy?
TODO: Prepare data for analysis by correcting the variables and contents of the data.
TODO: Putting it all together as a new cleaned/processed dataset: For this task, you are also encouraged to explore any cleaning packages in R other than those learned in the course (diplyr, tidyr, lubridate, etc).

# # if (!require('dplyr')) install.packages('dplyr'); library('dplyr')
# # if (!require('tidyr')) install.packages('tidyr'); library('tidyr')
if (!require('lubridate'))
  install.packages('lubridate');
if (!require('tidyquant'))
  install.packages('tidyquant', repos='https://cran.asia/');
if (!require('plotly'))
  install.packages('plotly', repos='https://cran.asia/');

library('lubridate')
library('plotly')
library('tidyquant')

Data Ingestion

# covid_malaysia_endpoint <- "https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/epidemic/cases_malaysia.csv"
covid_malaysia_endpoint <- 'cases_malaysia.csv'
covid_malaysia_df <- read.csv(covid_malaysia_endpoint, header=TRUE)
covid_malaysia_df$date <- as.Date(covid_malaysia_df$date, format="%Y-%m-%d")
str(covid_malaysia_df)
## 'data.frame':    708 obs. of  31 variables:
##  $ date                   : Date, format: "2020-01-25" "2020-01-26" ...
##  $ cases_new              : int  4 0 0 0 3 1 0 0 0 0 ...
##  $ cases_import           : int  4 0 0 0 3 1 0 0 0 0 ...
##  $ cases_recovered        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_active           : int  4 4 4 4 7 8 8 8 8 8 ...
##  $ cases_cluster          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_unvax            : int  4 0 0 0 3 1 0 0 0 0 ...
##  $ cases_pvax             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_fvax             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_boost            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_child            : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_adolescent       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_adult            : int  1 0 0 0 2 1 0 0 0 0 ...
##  $ cases_elderly          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_0_4              : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_5_11             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_12_17            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_18_29            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_30_39            : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_40_49            : int  1 0 0 0 0 1 0 0 0 0 ...
##  $ cases_50_59            : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_60_69            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_70_79            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_80               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cluster_import         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_religious      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_community      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_highRisk       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_education      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_detentionCentre: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_workplace      : int  NA NA NA NA NA NA NA NA NA NA ...
dim(covid_malaysia_df)
## [1] 708  31

Exploratory Data Analysis

TODO: Results may include visualization, prediction, evaluation of models and discussion of output

A brief Look on the graph

fig <- plot_ly(covid_malaysia_df, type = 'scatter', mode = 'lines')%>%
  add_trace(x = ~date, y = ~cases_new, name = 'Daily New Cvoid Cases')%>%
  layout(showlegend = F)
options(warn = -1)

fig <- fig %>%
  layout(
         xaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         plot_bgcolor='#e5ecf6', width = 1200)


fig

Machine Learning

TODO: Results may include visualization, prediction, evaluation of models and discussion of output

Conclusion

TODO: Conclusion

Presentation and Submission

TODO Report: Submission will be an R markdown published at Rpubs, and the link is to be submitted in spectrum. The R markdown may include the following:

TODO: Only one member per group will submit the report.
TODO: Each group is required to prepare a 10 minute presentation with powerpoint.
TODO: Both group members must present their parts.

End of Report